Skip to content

[fix](query-cache) include variant subcolumn path in query cache digest#61709

Merged
924060929 merged 1 commit intoapache:masterfrom
924060929:query-cache-variant
Mar 25, 2026
Merged

[fix](query-cache) include variant subcolumn path in query cache digest#61709
924060929 merged 1 commit intoapache:masterfrom
924060929:query-cache-variant

Conversation

@924060929
Copy link
Contributor

Different variant subcolumn queries (e.g. data['int_1'] vs data['int_nested']) were generating the same cache digest because normalizeSelectColumns() only used the base column name. This caused query cache to return wrong results when different subcolumns of the same variant column were queried.

Fix: include the variant subcolumn path in the normalized select column name so that different subcolumns produce different cache digests.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Different variant subcolumn queries (e.g. data['int_1'] vs data['int_nested'])
were generating the same cache digest because normalizeSelectColumns() only used
the base column name. This caused query cache to return wrong results when
different subcolumns of the same variant column were queried.

Fix: include the variant subcolumn path in the normalized select column name
so that different subcolumns produce different cache digests.
@Thearas
Copy link
Contributor

Thearas commented Mar 25, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@924060929
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 25, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 26546 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1e3fae7b7d26c0e82b02b042fddc6f3d0f836acf, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17604	4562	4304	4304
q2	q3	10641	781	534	534
q4	4679	351	251	251
q5	7562	1222	1031	1031
q6	180	175	147	147
q7	776	869	659	659
q8	9304	1462	1288	1288
q9	4948	4797	4735	4735
q10	6255	1932	1647	1647
q11	463	269	245	245
q12	719	588	466	466
q13	18033	2692	1952	1952
q14	224	233	212	212
q15	q16	721	748	655	655
q17	733	851	439	439
q18	5977	5375	5295	5295
q19	1114	994	622	622
q20	542	490	375	375
q21	4623	1844	1389	1389
q22	341	506	300	300
Total cold run time: 95439 ms
Total hot run time: 26546 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4798	4695	4594	4594
q2	q3	3911	4347	3833	3833
q4	878	1213	778	778
q5	4041	4418	4339	4339
q6	196	177	141	141
q7	1783	1676	1526	1526
q8	2487	2718	2540	2540
q9	7804	7377	7370	7370
q10	3879	4114	3671	3671
q11	524	436	425	425
q12	500	611	457	457
q13	2519	2851	2116	2116
q14	302	313	270	270
q15	q16	726	765	718	718
q17	1173	1306	1345	1306
q18	7080	6720	6535	6535
q19	913	857	937	857
q20	2055	2141	2041	2041
q21	3939	3554	3381	3381
q22	451	467	448	448
Total cold run time: 49959 ms
Total hot run time: 47346 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168736 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1e3fae7b7d26c0e82b02b042fddc6f3d0f836acf, data reload: false

query5	4346	632	520	520
query6	358	237	212	212
query7	4224	480	261	261
query8	349	253	235	235
query9	8724	2690	2681	2681
query10	529	415	326	326
query11	6962	5092	4884	4884
query12	183	127	125	125
query13	1277	469	348	348
query14	6122	3704	3489	3489
query14_1	2861	2781	2862	2781
query15	205	192	187	187
query16	1019	468	462	462
query17	1124	747	623	623
query18	2727	458	350	350
query19	231	216	189	189
query20	147	131	127	127
query21	214	134	111	111
query22	13126	13972	14729	13972
query23	16837	16307	16113	16113
query23_1	15945	15694	15740	15694
query24	7158	1613	1224	1224
query24_1	1252	1214	1234	1214
query25	525	472	398	398
query26	1257	271	150	150
query27	2742	476	291	291
query28	4431	1821	1821	1821
query29	818	562	476	476
query30	299	226	193	193
query31	1028	935	873	873
query32	82	70	77	70
query33	520	342	289	289
query34	885	900	511	511
query35	652	693	601	601
query36	1071	1095	984	984
query37	142	95	85	85
query38	2922	2935	2864	2864
query39	863	844	799	799
query39_1	790	807	812	807
query40	243	152	134	134
query41	65	60	59	59
query42	258	255	250	250
query43	238	247	214	214
query44	
query45	208	185	185	185
query46	876	981	615	615
query47	2121	2558	2063	2063
query48	306	309	226	226
query49	629	449	391	391
query50	676	272	214	214
query51	4173	4069	4059	4059
query52	265	264	260	260
query53	293	347	280	280
query54	295	275	258	258
query55	86	89	85	85
query56	331	318	304	304
query57	1916	1813	1757	1757
query58	281	277	272	272
query59	2804	2946	2749	2749
query60	347	336	331	331
query61	163	157	157	157
query62	642	595	541	541
query63	307	278	275	275
query64	4935	1293	1000	1000
query65	
query66	1391	461	367	367
query67	24315	24248	24191	24191
query68	
query69	423	311	280	280
query70	961	955	899	899
query71	342	320	302	302
query72	2799	2721	2374	2374
query73	531	537	309	309
query74	9613	9593	9382	9382
query75	2843	2791	2447	2447
query76	2268	1021	681	681
query77	361	405	313	313
query78	10916	11164	10475	10475
query79	1108	762	574	574
query80	826	622	550	550
query81	524	262	226	226
query82	1320	172	118	118
query83	352	267	252	252
query84	249	135	107	107
query85	997	496	449	449
query86	436	339	289	289
query87	3121	3129	3045	3045
query88	3509	2632	2629	2629
query89	421	364	341	341
query90	1729	173	178	173
query91	169	166	139	139
query92	79	74	73	73
query93	896	858	485	485
query94	514	320	284	284
query95	583	339	316	316
query96	643	511	228	228
query97	2458	2467	2386	2386
query98	242	222	216	216
query99	1012	928	908	908
Total cold run time: 249389 ms
Total hot run time: 168736 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 83.33% (5/6) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@924060929 924060929 merged commit c945160 into apache:master Mar 25, 2026
31 of 33 checks passed
@924060929 924060929 deleted the query-cache-variant branch March 25, 2026 08:57
github-actions bot pushed a commit that referenced this pull request Mar 25, 2026
…st (#61709)

Different variant subcolumn queries (e.g. data['int_1'] vs
data['int_nested']) were generating the same cache digest because
normalizeSelectColumns() only used the base column name. This caused
query cache to return wrong results when different subcolumns of the
same variant column were queried.

Fix: include the variant subcolumn path in the normalized select column
name so that different subcolumns produce different cache digests.
github-actions bot pushed a commit that referenced this pull request Mar 25, 2026
…st (#61709)

Different variant subcolumn queries (e.g. data['int_1'] vs
data['int_nested']) were generating the same cache digest because
normalizeSelectColumns() only used the base column name. This caused
query cache to return wrong results when different subcolumns of the
same variant column were queried.

Fix: include the variant subcolumn path in the normalized select column
name so that different subcolumns produce different cache digests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants